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Different types of data including voice data of a user, image 
data produced by picturing the mouth of the user, and 
ambient noise data are provided through an input unit 10. 
'Those data are analyzed by preprocessors 20 to 23 respec- 
tively to determine characteristic parameters. In a classifi- 
cation data constructing unit 24, classification data is con- 
structed from the characteristic parameters and transferred to 
a classification unit 25 for classification. Meanwhile, an 
integrated parameter constructing unit 26 constructs inte- 
grated parameters from the characteristic parameters pro- 
vided by the preprocessors 20 to 23. An adaptivity deter- 
mining unit 27 selects a table corresponding to the class 
determined by the classification unit 25. From the standard 
parameters saved in the table and the integrated parameter 
from the integrated parameter constructing unit 26, the voice 
emitted by a user is recognized. Accordingly, the accuracy of 
the voice recognition will be increased. 

12 Claims, 9 Drawing Sheets 
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APPARATUS AND METHOD FOR 
RECOGNITION AND APPARATUS AND 
METHOD FOR LEARNING 

This is a continuation of copending International Appli- 5 
cation PCT/JP97/04755 having an international filling dale 
of Dec. 22, 1997. 

TECHNICAL FIELD 

The present invention relates to an apparatus and a 10 
method for recognition and an apparatus and a method for 
learning. More particularly, the present invention relates to 
an apparatus and a method for recognition and an apparatus 
and a method for learning in which in recognizing e.g. 
sounds and objects, other data is utilized as well as their 
audio and video data to increase the recognition accuracy. 

BACKGROUND ART 

In a conventional voice recognition apparatus for recog- 20 
nizing voice sounds, voice data picked up by a microphone 
is (acoustically) analyzed and an analyzed result is used to 
recognize the voice emitted by a user. 

However, such a conventional voice recognition appara- 
tus utilizes the analyzed result from the voice data picked up 25 
by the microphone for voice recognition, whereby its rec- 
ognition accuracy will be limited to a certain level. 

It should be understood that not only the voice data picked 
up by a microphone but also other factors such as the 
expression and the movement of the mouth of a subject are 30 
notable and thus concerned for recognizing the voice of the 
subject. 

The voice recognition apparatus is normally used under 
hostile conditions where different types of noise are received 
but not in a particular circumstance, e.g. a sound-proof 
chamber, where the voice of a subject only can be picked up 
by a microphone. In particular, a renovated navigation 
system may be equipped with such a voice recognition 
apparatus which however receives unwanted noise sounds, 
including sounds of a CD (compact disk) player, an engine, 
and an air-conditioner mounted in a vehicle, other than the 
voice of a subject to be recognized. Since it is very difficult 
to remove noise sounds from the voice data, the voice 
recognition has to deal with the noise sounds for improving ^ 
its accuracy. 

It is also common in the conventional voice recognition 
apparatus that the voice data picked up by a microphone is 
processed by a specific manner to determine characteristic 
parameters and the voice recognition is carried out by 5Q 
calculating the distance between the characteristic param- 
eters platted in a parameter space. As a rule, the character- 
istic parameters which are essential for the voice recognition 
are varied depending on the conditions where the voice 
recognition apparatus is set. 55 

DISCLOSURE OF THE INVENTION 

The present invention is directed towards overcoming the 
foregoing drawbacks and its object is to increase the recog- 
nition accuracy of a recognition apparatus for recognizing go 
voice or other factors. 

A recognition apparatus, as defined in claim 1, comprises: 
a first classifying means for classifying different types of 
input data into classes depending on their characteristics; an 
integrated parameter constructing means for constructing an 65 
integrated parameter through integrating the different types 
of input data; a standard parameter saving means for saving 
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tables, each table carrying standard parameters and assigned 
to one of the classes determined by the first classifying 
means; and a recognizing means for recognizing a given 
subject using the integrated parameter and the standard 
parameters listed in the table assigned to the class deter- 
mined by the first classifying means. 

A recognition method, as defined in claim 5, comprises 
the steps of: classifying different types of input data into 
classes depending on their characteristics and constructing 
an integrated parameter through integrating the different 
types of input data; and recognizing a given subject using the 
integrated parameter and a table carrying standard param- 
eters and assigned to one of the classes determined by the 
classification. 

A learning apparatus, as defined in claim 6, comprises: a 
first classifying means for classifying different types of input 
data into classes depending on their characteristics; an 
integrated parameter constructing means for constructing an 
integrated parameter through integrating the different types 
of input data; and a classifying means for classifying the 
integrated parameters according to the class determined by 
the first classifying means. 

A learning method, as defined in claim 9, comprises the 
steps of: classifying different types of input data into classes 
depending on their characteristics and constructing an inte- 
grated parameter through integrating the different types of 
input data; and classifying the integrated parameters accord- 
ing to the class determined by the classification. 

In the recognition apparatus defined in claim 1, the first 
classifying means classifies the different types of input data 
into classes depending on their characteristics and also, the 
integrated parameter constructing means constructs an inte- 
grated parameter through integrating the different types of 
input data. The standard parameter saving means includes 
tables, each table carrying standard parameters and assigned 
to one of the classes determined by the first classifying 
means. The recognizing means thus recognizes a given 
subject using the integrated parameter and the standard 
parameters listed in the table assigned to the class deter- 
mined by the first classifying means. 

In the recognition method defined in claim 5, different 
types of input data are classified into classes depending on 
their characteristics and an integrated parameter is con- 
structed through integrating the different types of input data. 
Then, a given subject can be recognized using the integrated 
parameter and a table carrying standard parameters and 
assigned to one of the classes determined by the classifica- 
tion. 

In the learning apparatus defined in claim 6, the first 
classifying means classifies different types of input data into 
classes depending on their characteristics and the integrated 
parameter constructing means constructs an integrated 
parameter through integrating the different types of input 
data. The classifying means also classifies the integrated 
parameters according to the class determined by the first 
classifying means. 

In the learning method defined in claim 9, different types 
of input data are classified into classes depending on their 
characteristics and an integrated parameter is constructed by 
integrating the different types of input data. The integrated 
parameters are then classified according to the class deter- 
mined by the classification. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram showing an arrangement of a 
navigation system according to the present invention; 
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FIG. 2 is a block diagram of an arrangement of a first 
embodiment of a voice recognition apparatus according to 
the present invention; 

FIG. 3 is a diagram explaining a process in a preprocessor 
unit 21; 5 

FIG. 4 is a block diagram of an arrangement of a first 
embodiment of a learning apparatus according to the present 
invention. 

FIG. 5 is a diagram showing a parameter space; 

FIG, 6 is a block diagram of an arrangement of a second ]0 
embodiment of the voice recognition apparatus according to 
the present invention; 

FIG. 7 is a block diagram of an arrangement of a second 
embodiment of the learning apparatus according to the 
present invention; 

FIG. 8 is a block diagram of an arrangement of a third 
embodiment of the voice recognition apparatus according to 
the present invention; and 

FIG. 9 is a block diagram of an arrangement of a third 2 o 
embodiment of the learning apparatus according to the 
present invention. 

BEST MODE FOR CARRYING OUT THE 
INVENTION 

25 

FIG. 1 illustrates an arrangement of a navigation system 
according to the present invention. 

The navigation system which may be provided in a 
vehicle comprises a system controller unit 1, a position 
measuring device 2, a database device 3, an input device 4, 3Q 
and an output device 5, and can be controlled by operating 
e.g. button switches or through speech inputs in a dialogue 
mode. The navigation system may be used as a portable 
type. 

The system controller unit 1 receives and transmits data 35 
with each block in the system to control the entire action of 
the system. The position measuring device 2 receives an 
electrical wave from a GPS (Global Positioning System) 
satellite and measures the current position with a measuring 
device such as gyroscope or a vehicle speed sensor. The 40 
database device 3 holds (saves) map information in an 
electronic format and other relevant data required for navi- 
gation which can be retrieved in response to a command 
from the system controller unit 1 and supplied to the system 
controller unit 1. 45 

r iUc input device 4 includes button switches or a joystick 
for operating the navigation system, a microphone for enter- 
ing voice data, a CCD (Charge Coupled Device) camera for 
picturing a user, an acceleration sensor for detecting vibra- 
tion of the vehicle, sensors for measuring the moisture and 50 
the temperature, and other relevant sensors. An output signal 
of the input device 4 operated by the button switches or the 
joystick is transferred to the system controller unit 1. Also, 
the input device 4 includes a voice recognition device for 
recognizing voice components in an input sound and deliv- 55 
ering its resultant data to the system controller unit 1. 

The output device 5 includes, for example, a liquid crystal 
display monitor or a CRT (Cathod Ray Tube) for displaying 
an image and the like, a speakers) for emitting speech and 
the like, and a voice mixer device for generating a composite 60 
sound from text data, and can control a display of map 
information or the current position and an output of speech. 
The output device 5 when receiving a text data from the 
system controller unit 1 can combine it with its correspond- 
ing speech in the voice mixer device. 65 

In the navigation system having the above mentioned 
arrangement, when the user speeches the name of a location 
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as the destination, its voice is recognized by the voice 
recognition device mounted in the input device 4 and its 
voice data is transferred to the system controller unit 1. The 
system controller unit 1 upon receiving the voice data of the 
destination recognizes the current position from an output of 
the position measuring device 2 and accesses the map 
information saved in the database device 3 to determine a 
route from the current position to the destination. The 
system controller unit 1 transfers the route together with its 
relevant map information to the output device 5 for display 
and simultaneously, delivers to the voice mixer device of the 
output device 5 a voice data for instructing the route. 

This allows the user to arrive at the destination without 
difficulty, 

FIG. 2 illustrates an arrangement of a first embodiment of 
the voice recognition device mounted in the input device 4 
shown in FIG. 1. 

An input unit 10 comprises a microphone 11, a CCD 
camera 12, another microphone 13, a sensor 14, an amplifier 
15, an A/D converter 16, another amplifier 17, and A/D 
converters 18 and 19 and can release various input data used 
for recognition of voices of the user as a driver. 

More specifically, the microphone 11 may be of a direc- 
tional type and pointed to the user who is the driver. The 
voice of the user is picked up mostly by the microphone 11. 
The voice picked up by the microphone 11 is converted to 
an audio signal which is then amplified by the amplifier 15 
and transferred to the A/D converter 18. In the A/D converter 
18, the audio signal of analog form supplied from the 
amplifier 15 is sampled by a given sampling clock and 
quantized to particular quantizing steps so that it can be 
converted to a digital signal of audio data. The audio data is 
transmitted from the A/D converter 18 to a preprocessor 20. 

The CCD camera 12 is located to picture the mouth of the 
user The mouth of the user pictured by the CCD camera 12 
is converted to a video signal which is transferred to the A/D 
converter 16. The A/D converter 16 like the A/D converter 
18 converts the video signal of analog form to an image data 
which is then transmitted to a preprocessor 21. 

The microphone 13 may be of non-directional type for 
receiving sounds other than the voice of the user. For 
example, picked up are ambient sounds from an engine, 
from a radio receiver or a CD player mounted in the vehicle, 
and from an air-conditioner, and when a window is opened, 
external noise. The sounds picked up by the microphone 13 
is processed by the amplifier 17 and the A/D converter 19 in 
the same manner as of the amplifier 15 and the A/D 
converter 18. As a result, the sounds are converted to an 
audio data and supplied to a preprocessor 22. 

The sensor 14 may be an acceleration sensor for detecting 
vibration of the vehicle or a sensor for measuring the 
moisture or the temperature and its output is transferred to 
a preprocessor 23. An output of the acceleration sensor 
represents the level (or amplitude) of noise caused by the 
vibration of the vehicle. An output of the sensor for mea- 
suring the moisture or the temperature determines whether it 
rains or not. If it is raining, the level of its sound is 
calculated. 

In the preprocessors 20 to 23, their respective data 
received are analyzed to determine characteristic parameters 
indicative of characteristics of the data. 

More particularly, the preprocessors 20 and 22 calculate 
from the audio data zero-cross values, power levels at each 
frequency band, linear predictive coefficients, cepstrum 
factors, and other parameters on the basis of each audio 
frame as a time unit. They are transmitted as the character- 
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istic parameters to a classification data construction unit 24 
and an integrated parameter construction unit 26, 

In the preprocessor 21, the horizontal length Lj and the 
vertical length L2 of the mouth shown in FIG. 3 may be 
calculated from the video data representing the mouth of the 
user and a ratio of Lj/L^ is supplied as a characteristic 
parameter to the classification data construction unit 24 and 
the integrated parameter construction unit 26. Alternatively, 
the preprocessor 21 may calculate from the video data of the 
mouth of the user a motion vector, edge values, and DCT 
(discrete cosign transform) coefficients which are then trans- 
ferred as the characteristic parameters to the classification 
data construction unit 24 and the integrated parameter 
construction unit 26. 

In the preprocessor 23, the characteristic parameters 
including the level (or amplitude) of noise generated by the 
vibration of the vehicle and the level of raining sound may 
be determined through analyzing the output of the sensor 14. 
Those parameters are also transferred to the classification 
data construction unit 24 and the integrated parameter 
construction unit 26. 

From the classification data construction unit 24, at least 
some of the characteristic parameters received from the 
preprocessors 20 to 24 are delivered as classification data 
used for classification to a classification unit 25. In the 
classification unit 25, the classification data received from 
the classification data construction unit 24 are classified 
depending on their pertinent characteristics. 

More specifically, the classification unit 25 delivers a 
value assigned to a pattern of the characteristic parameters 
of the classification data as the class of classification data to 
an adaptivity determining unit 27. 

Assuming that a characteristic parameter is expressed by 
A bits and a classification data consists of a B number of the 
characteristic parameters, the number of patterns of the 
characteristic parameters of the classification data is (2 A ) S . 
Accordingly, when either A or B is great, the number of 
classes becomes enormous and its handling will hence be 
speeded up with much difficulty. 

For reducing the number of bits of the characteristic 
parameters of each classification data, a proper technique 
such as ADRC (Adaptive Dynamic Range Coding) is used 
as the preprocess before the classification. 

The ADRC process starts with detecting the highest 
(referred to as a maximum characteristic parameter 
hereinafter) and the lowest (referred to as a minimum 
characteristic parameter hereinafter) of the B characteristic 
parameters of the classification data. Then, a difference DR 
between the maximum characteristic parameter MAX and 
the minimum characteristic parameter MIN is calculated 
(=MAX-MIN) and treated as a local dynamic range in the 
classification data. According to the dynamic range DR, 
each characteristic parameter of the classification data is 
quantized to a numeral of C bits which is smaller than A bits. 
More specifically, the minimum characteristic parameter 
MIN is subtracted from each characteristic parameter of the 
classification data and a resultant difference is divided to 
DR/2 C 

Accordingly, each of the characteristic parameters of the 
classification data is denoted in C bits. If C=l, the number 
of patterns of the B characteristic parameters is (2 1 )** which 
is considerably smaller than the number without the ADRC 
process. 

In respect of minimizing the number of patterns of the 
characteristic parameters of the classification data, it is 
desirable that B, the number of the characteristic parameters 
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determining the classification data, is not a large number, 
However, if B is too small, the result of the classification will 
be unfavorable. It is thus essential to determine B over 
balancing the relevant numbers. 
5 The integrated parameter construction unit 26 allows at 
least some of the characteristic parameters received from the 
preprocessors 20 to 24 to be integrated (or gathered) and 
delivered as an integrated parameter to the adaptivity deter- 
mining unit 27. 

10 The integrated parameter may be a group of the charac- 
teristic parameters which are identical to those of the clas- 
sification data or a group of other characteristic parameters 
than the characteristic parameters of the classification data. 
The adaptivity determining unit 27 comprises a standard 

15 parameter memory 28 and a matching block 29 and when 
receiving a class from the classification unit 25 or an 
integrated parameter from the integrated parameter con- 
struction unit 26, selectively extract from the standard 
parameter memory 28 a standard parameter table which 

20 carries corresponding standard parameters to the class 
received from the classification unit 25. 

More particularly, the standard parameter memory 28 
holds a corresponding number of the standard parameter 
tables to the classes, each standard parameter table contains 
a group of standard parameters, e.g. for sound elements, 
which can be determined by learning of a learning apparatus 
(FIG. 4) described later. In the matching block 29, a corre- 
sponding one of the standard parameter tables to the class 

3Q received from the classification unit 25 is selected. 

The matching block 29 then calculates an Euclidian 
distance from each of the standard parameters listed in the 
selected standard parameter table and the integrated param- 
eter from the integrated parameter construction unit 26 and 

35 releases as the result of voice recognition a sound element 
attributed to the standard parameter which yields the small- 
est of the Euclidian distance. 

Accordingly, the voice recognition apparatus of the 
embodiment permits the voice of a user to be recognized not 

40 only from the voice data picked up by the microphone 11 
mainly as speech of the user but also from the image data 
pictured by the CCD camera 12, such as the motion of the 
mouth of the user, the audio data picked up by the micro- 
phone 13, and the other data detected by the sensor 14 such 

45 as different types of noise and different bands of frequency, 
hence increasing the rate of recognition. 

Also, the apparatus allows a corresponding standard 
parameter pattern corresponding to the class determined by 
two or more data supplied from the input unit 10 to be 

50 selected from a group of the standard parameter tables which 
are assigned to their respective classes. As the optimum 
standard pattern table for recognition of the voice of the user 
is obtained from two or more data supplied from the input 
unit 10, the rate of recognition can be more increased. 

55 FIG. 4 illustrates an arrangement of a learning apparatus 
for using a learning process to determine the standard 
parameters which are registered to the standard parameter 
table of each class in the standard parameter memory 28 
shown in FIG. 2. 

60 There are provided an input unit 30 (including a micro- 
phone 31, a CCD camera 32, a microphone 33, a sensor 34, 
an amplifier 35, an A/D converter 36, an amplifier 37, and 
A/D converters 38 and 39), preprocessors 40 to 43, a 
classification data construction unit 44, a classification unit 

65 45, and an integrated parameter construction unit 46 which 
are identical in construction to the input unit 10 (including 
the microphone 11, the CCD camera 12, the microphone 13, 
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the sensor 14, the amplifier 15, the A/D convener 16, the device 30 in the learning apparatus, the result of classifica- 

amplifier 17, and the A/D converters 18 and 19), the pre- tion by the classification unit 25 in the voice recognition 

processors 20 to 23, the classification data construction unit apparatus is identical to that by the classification unit 45 in 

24, the classification unit 25, and the integrated parameter the learning apparatus. Accordingly, the standard parameters 

construction unit 26 respectively in the voice recognition 5 determined by the input data of the input unit 10 or the 

apparatus shown in FIG, 2. A memory 47 is provided with optimum standard parameters to the input data can be used 

an address terminal (AD) for receiving the class as an for voice recognition in the voice recognition apparatus, 

address from the classification unit 45 and can save the It may be allowed in the learning apparatus shown in FIG. 

integrated parameter supplied from the integrated parameter 4 that a group of the integrated parameters of each class over 

construction unit 46. 30 a sound element are saved in the memory 47. More 

In the learning apparatus having the above mentioned particularly, for learning, with a speaker yielding each sound 

arrangement, learn data for learning process are introduced element under different noise conditions and a plurality of 

into the input unit 30. More specifically, the voice of a speakers doing the same, the resultant integrated parameters 

speaker is picked up by the microphone 31. At the time, the may be scattered over specific regions in a parameter space, 

mouth of the speaker is pictured by the CCD camera 32. 15 For example, FIG. 5(A) shows a three-dimensional 

Moreover, the microphone 33 picks up e.g. engine sound of parameter space where the integrated parameter is expressed 

vehicles, music sound from a CD player, sound of raining, by three components P 2 , and P 3 for ease of explanation, 

operating sound of an air-conditioner, and other ambient When the integrated parameters of sound elements "7" and 

noise. The sensor 34 can detect levels of the vibration and ^ in one c)ass are pbUed ^ they are grouped in specific 

when the microphone 33 picks up the sound of raining, 20 regjons of (he parameter space 

degrees of the temperature and the moisture under the M[hQUgh aU ^ in me fegion may be regarded M the 

raining. standard parameters of each sound element, it is preferable 

The learn data received by the input unit 30 are then t0 determine a barycenter in the region which is then treated 

processed in the preprocessors 40 to 43, the classification as the s i an dard parameter of the sound element as shown in 

data construction unit 44, the classification unit 45, and the 25 pjQ 5^ 

integrated parameter construction unit 46 by the same man- nG 6 illustrates ao arrangem ent of a second embodiment 
ners as those of the preprocessors 20 to 23, the classification of tfae vojce ition apparatus provide d with the input 
data construction unit 24, the classification unit 25, and the uni( 4 shown ^ pjQ h In the fi like are 
integrated parameter construction unit 26 shown in FIG. 2. denmed b ^ numerals ^ those shown - n nG 2 and ^ 
As the result, the memory 47 is supplied with a class from explanation will be 0 ' mitted her e ma fler. In brief, the voice 
the classification unit 45 and an integrated parameter from recognitioD apparatus of the second embodiment is substan- 
tia integrated parameter construction unit 46. {My ifi construction t0 tne voicc recog nition 

The memory 47 saves the integrated parameter from the appa ratus shown in FIG. 2 except that the standard param- 

integrated parameter construction unit 46 as a standard ^ eter mem0 ry 28 is substituted by a group of standard 

parameter in an address assigned to the class from the parameter memories 28 3 to 2S M and a classification data 

classification unit 45. construction unit 51 and a classification unit 52 are added. 

Such a process is carried out over each sound element xh e classification data construction unit 51 constructs a 

produced by the speaker with variations of noise and data classification data from a plurality of data supplied by the 

input picked up by the microphone 33 and the sensor 34. 4Q mput un i t 10 and delivers it to the classification unit 52. The 

Accordingly, a group of the integrated parameters of each classification unit 52 classifies the classification data from 

class are saved in the corresponding address of the memory the classification data construction unit 51 to a correspond- 

47. ing class which is then transferred as the result of classifi- 

The integrated parameters (of the group) allocated to each cation to the preprocessors 20 to 23. 

address of the memory 47 are then saved in the standard 45 In the preprocessors 20 to 23, preprocess actions suited 

parameter memory 28 shown in FIG. 2 as the standard for the class from the classification unit 52. More 

parameters in a standard parameter table of the class. particularly, when the voice data picked up by the micro- 

In the learning apparatus, a data produced with the phone 11 contains more vocal sounds such as vowels, linear 

microphone 33 receiving a noise and a data produced predictive coefficients and cepstrum coefficients are more 

without such a noise are classified to different classes by the 50 preferable to identify the voice than zero-cross values. When 

classification unit 45. As the result, an optimum standard the voice data picked up by the microphone 11 contains 

parameter table with the noise and an optimum standard more voiceless sounds such as consonants, zero-cross values 

parameter table without the noise are constructed. This and power levels in different frequency bands, and duration 

allows the voicc recognition apparatus shown in FIG. 2, of consonant are more favorable than linear predictive 

which releases a class from the classification unit 25 iden- 55 coefficients and cepstrum coefficients. When the level of 

tical to the class from the classification unit 45, to select the noise received by the microphone 13 is low, its effect will be 

optical standard parameter table when the noise is input or disregarded. But, if the level of noise is high, its effect 

the other optical standard parameter table when the noise is should be concerned in the voice recognition. When the 

not input. mouth of a user exhibits less or no motion, its motion vector 

Also, in the learning apparatus, the standard parameters 60 may be unnecessary. If the mouth creates a degree of motion, 

may be classified into classes depending on not only the its motion vector should be considered in the voice recog- 

presence and absence of noise but also types and levels of nition. Furthermore, when no or less vibration of a vehicle 

the noise, types of sound element produced by the speaker, is generated or it is not raining, the output of the sensor 14 

and sexuality, male or female, of the speaker. The manner of may be negligible. In the opposite case, the output of the 

classification over the standard parameters however is not a 65 sensor 14 should count in the voice recognition, 

critical issue. When the input data of the input unit 10 in the It is hence true that the characteristic parameters optimum 

voice recognition apparatus is identical to that of the input for the voicc recognition (for having a result of recognition 
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at a higher accuracy) are varied depending on the other 27, one of the standard parameter tables saved in the 

factors, not to say the voice itself to be recognized. standard parameter memory 28^ is selected according to 

In the voice recognition apparatus shown in FIG. 6, the the class defined by the classification unit 25 and the 

classification data is constructed from data outputs of the distance between each of the standard parameters listed in 

input unit 10 and then classified into classes. Then, the 5 the selected standard parameter table and the integrated 

optimum characteristic parameters for each class can be parameter received from the integrated parameter construe- 

determined by the preprocessors 20 to 23. ton unit 26 is calculated by the matching unit 29. The sound 

According to the embodiment shown in FIG. 6, the elcmem of the standard parameter of which distance is the 

parameter space for calculating a distance in the adaptivity sma les < l ° the .^grated parameter is thus released as a 

determining unit 27 (the matching unit 29) is modified 10 result of the voice recognition. 

according to the class defined by the classification unit 52. Accordingly, the preprocess action is earned out accord- 

The distance in the parameter space corresponding to the in 8 t0 lhe class defined b y the classification unit 52 to 

class provided from the classification unit 52 is computed by determine an optimum characteristic parameter. This allows 

the adaptivity determining unit 27 and a result of the voice l «e result of the voice recognition to be calculated at a higher 

recognition is produced from the distance. 15 accuracy from the distance of the optimum characteristic 

It is assumed herein that the classification data from the P arameter * the parameter space, 

classification data construction unit 51 are classified by the F1G - 7 Urates an arrangement of a learning apparatus 

classification unit 52 into an M number of classes. for Performing a learning process to determine the standard 

r-ni 11f . . tU . parameters to be listed in the standard parameter table of 

The preprocessors 20 to 23 for determining the charac- 20 ^ ^ ^ fa each Qf ^ standard meter memories 

tenstic parameters corresponding to the classes denned by - e , * c , . c 

.Li e** ■* m u j . j * .u a 28, to 28^ shown in FIG. 6. 

the classification unit 52 may be adapted to vary the degree t . ™ ,., , JL ... 

F . , • # - f f „ i r A In the fieure, like components are denoted by luce numer- 

of the characteristic parameter (for example, a linear pre- , ; . f . J 

dictive coefficient of the eighth or twelfth degree) or to als 35 ' ho Jf s ^ own in 4 and the ' r ex P la ° at ! on Wll \ be 

cancel the output of the characteristic parameters (for 25 om.tted. The laming apparatus is substantially identwal to 

, . fu . j 4 . n . # . j that shown in FIG. 4 except that the memory 47 is replaced 

example, when the vehicle stands still in a quiet location and , r . . , r 

t , f. r, u u ii a \u bv a group of memories 47, to 47 M and an additional set or 

thus, the outputs of the microphone 13 and the sensor 14 are J " & \t F . , 1 . M . „ .„ 

-.I ? i vi ™ a cl assification data construction un it 61 , a cl assi fi cation unit 

negligible, the preprocessors 22 and 23 can remain „ • , . 

inactivated). 62 ' and a selector 63 are provided. 

„, , " c , , tU . . a t . .. . , oo The classification data construction unit 61 and the clas- 

The class denned by the classification unit S2 is also 30 sificalion unit 62 perform the same process actions as of the 

transferred to the adaptivity determining unit 27 as well as classiflcatjon data eonstruction unit 51 and the classification 

the Preprocessor! ; 20 to 23. ITie adap.ivtty determining unit ^ $2 ivel xhe class delermined by lne classifl . 

27 includes the M standard parameter memories 28, to 28 M , .. • t r A ,„ (U „„„ , ><n >ii 

. . . . , ' cation is transferred to the preprocessors 40 to 43 and the 

as described previously, which hold the standard parameters fi3 ^ cess aclioc of lhe preprocessors 40 10 

in their respective parameter spaces corresponding to the M 35 43 . & .^.^ of ^ mcesso £ 20 t0 23 shown io 

classes defined by the classification unit 52. v . n K , , ^ nttm J m ' Kor ,^ tor - t - n *„ mf ^> r ^ r 

J FIG. 6, whereby an optimum characteristic parameter cor- 

The standard parameter memories 28 m (m-1, 2, .... M) responding to the class defined by the classification unit 62 

like the standard parameter memory 28 shown in FIG. 2 also is determined and released. 

save the standard parameter tables of their corresponding ^ classification data constr uction unit 44, the classifi- 

classes defined by the classification unit 25. <*u cation unit 45 and the integrated para meter construction 

The standard parameter tables saved in the standard urnt 46 also perform the same process actions as of the 

parameter memories 28 a to 28^, can be calculated by the classification data construction unit 24, the classification 

learning process of another learning apparatus (FIG. 7) un it 25, and the integrated parameter construction unit 26 

described later. ^ respectively. Accordingly, the class from the classification 

The adaptivity determining unit 27 upon receiving the unit 45 and the integrated parameter from the integrated 

class from the classification unit 52 selects one of the parameter constructing unit 46 are released, 

standard parameter memories 28 j to 28^ which corresponds The selector 63 in response to the class defined by the 

to the class (and is thus referred to as a standard parameter classification unit 62 feeds a select signal to any of the chip 

memory 28 /B4 ). 50 select (CS) terminals of the memories 47 j to 47 M . A corre- 

The characteristic parameters from the preprocessors 20 sponding one of the memories 47 a to 47 M to the class 

to 23 are transferred via the classification data construction supplied from the classification unit 62 is thus selected, 

unit 24 to the classification unit 25 where they are classified. Also, the class defined by the classification unit 45 is fed 

The class as the result of classification is then supplied to the to the address (AD) terminals of the memories. 47^ to 47^. 

adaptivity determining unit 27. Also, the characteristic 5S This allows each integrated parameter from the integrated 

parameters from the preprocessors 20 to 23 are transferred parameter construction unit 46 to be saved to a correspond - 

to the integral parameter construction unit 26 where they are ing address to the class defined by the classification unit 45 

shifted to an integral parameter. The integrated parameter is in one of the memories 47 x to 47 M selected according to the 

constructed by the integral parameter construction unit 26 class defined by the classification unit 62. 

from the characteristic parameters which are allocated in the 60 Accordingly, the integrated parameters (of a group) allo- 

parameter space identical to that of the standard parameters cated to each address of one of the memories 47^ to 47^ are 

listed in the standard parameter table saved in the standard then saved in a corresponding one of the standard parameter 

parameter memory 28^ selected by the adaptivity deter- memories 2S 1 to 28^ shown in FIG. 6 as the standard 

mining unit 27. parameters in a standard parameter table of the class 

The integrated parameter constructed by the integrated 65 (defined by the classification unit 25). 

parameter construction unit 26 is transferred to the adaptiv- It is also possible, as described with FIG. 5, that the 

ity determining unit 27. In the adaptivity determining unit barycenter of the group of the integrated parameters plotted 
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in the space is saved as a standard parameter in each of the 
standard parameter memories 2S 1 to 28 M shown in FIG. 6. 

FIG. 8 illustrates an arrangement of a third embodiment 
of the voice recognition apparatus provided with the input 
device 4 shown in FIG. 1. In the figure, like components are 5 
denoted by like numerals as those shown in FIG. 6 and their 
explanation will be omitted. The voice recognition apparatus 
of this embodiment is substantially identical to that shown in 
FIG, 6 except that extra preprocessors 71 to 74 are provided 
and their outputs replacing the outputs of the preprocessors 1Q 
20 to 23 are supplied to the integrated parameter construc- 
tion unit 26. 

The preprocessors 71 to 74 receive the same data as 
received by the preprocessors 20 and 23. The class defined 
by the classification unit 52 is also supplied to the prepro- ^ 
cessors 71 to 74. 

The prepocessors 71 and 74 in response to the class 
received from the classification unit 52 carry out the pre- 
process action to determine optimum characteristic param- 
eters which are transferred to the integrated parameter ^ 
constructing unit 26. It is noted that the preprocess action of 
the preprocessors 71 to 74 is substantially different from that 
of the preprocessors 20 to 23. More particularly, while the 
outputs of the preprocessors 20 to 23 are used for ultimately 
determining the class in the classification unit 25, the outputs ^ 
of the preprocessors 71 to 74 are shifted to the integrated 
parameter released from the integrated parameter construc- 
tion unit 26. Hence, it is true that the optimum characteristic 
parameters used for classification of the classification unit 
25 are calculated by the preprocessors 20 and 23 in accor- 3Q 
dance with the class defined by the classification unit 52 and 
simultaneously, the optimum characteristic parameters used 
for voice recognition are calculated by the preprocessors 71 
and 74 in accordance with the class defined by the classi- 
fication unit 52. 35 

FIG. 9 illustrates an arrangement of a learning apparatus 
for performing a learning process to determine the standard 
parameters to be listed in the standard parameter table of 
each class saved in each of the standard parameter memories 
28 A to 28^ shown in FIG. 8. 4Q 

In the figure, like components are denoted by like numer- 
als as those shown in FIG. 7 and their explanation will be 
omitted. The learning apparatus is substantially identical to 
that shown in FIG. 7 except that an extra set of processors 
81 to 84 are provided and their outputs replacing the outputs 45 
of the preprocessors 40 to 43 are supplied to the integrated 
parameter construction unit 46. 

In action, the optimum characteristic parameters used for 
classification of the classification unit 45 are calculated by 
the preprocessors 40 and 43, like the preprocessors 20 and 50 
23 shown in FIG. 8, in accordance with the class defined by 
the classification unit 62 while the optimum characteristic 
parameters used for voice recognition are calculated by the 
preprocessors 81 to 84, like the preprocessors 71 and 74 
shown in FIG. 8, in accordance with the class defined by the 55 
classification unit 62. 

Although the integrated parameters determined by the 
learning process of the learning apparatus shown in FIG. 9 
are saved in the standard parameter memories 28 a to the 28 M 
shown in FIG. 8, they may be saved not of all. It is also 60 
possible, as described with FIG. 5, that the barycenter of the 
group of the integrated parameters plotted in the space is 
saved as a standard parameter. 

As the present invention is described above in the form of 
a voice recognition apparatus, it is also applicable to a 65 
similar apparatus for recognizing other subjects than the 
speech including images, characters, and human beings, 
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Although the outputs of the CCD camera 32, the micro- 
phone 33, and the sensor 34 are used other than the voice of 
a user to be recognized, they are not of limitations. 

Also, in the embodiment shown in FIG. 2, the classifica- 
tion data is constructed from data output of the preproces- 
sors 20 to 23 and used for classification in the classification 
unit 25, The classification data may be constructed directly 
from outputs of the input unit 10 and used for classification 
in the classification unit 25. 

Although the preprocessors 20 to 23, 40 to 43, 71 to 73, 
and 81 to 83 of the embodiments shown in FIGS. 6 to 9 are 
supplied with the class for carrying a preprocess action 
according to the class, they may be fed with a function 
related to the class so that they can perform an arithmetical 
operation using the function to execute the preprocess action 
in accordance to the class. 

For simplifying the description of the embodiments, the 
voice recognition in the matching unit 29 is based on the 
distance between the integrated parameter and the standard 
parameter in an applicable parameter space. It is also pos- 
sible for the matching unit 29 to calculate the distance 
between the standard parameter and the integrated parameter 
specified in a time sequence and the probability of appear- 
ance of such a time sequence which are then used for 
determining the result of the voice recognition. Moreover, 
the matching unit 29 may be provided with a variety of voice 
recognition algorithms assigned to the corresponding levels 
of the class defined by the classification units 25 and 52 for 
the voice recognition. 

It is also understood that the voice recognition appara- 
tuses illustrated in FIGS. 2, 6, and 8 and the learning 
apparatus illustrated in FIGS. 4, 7, and 9 may be imple- 
mented in the form of software applications for a micropro- 
cessor having a CPU and memories as well as the hardware 
installations. 

Industrial Applicability 

According to a recognition apparatus defined in claim 1 
and a recognition method defined in claim 5, different types 
of input data are classified into classes depending on their 
characteristics and also, integrated to integrated parameters. 
Then, a subject is recognized using a combination of the 
integrated parameter and a table which carries standard 
parameters attributed to each class determined by the clas- 
sification. As the table optimum for each case is used, the 
recognition will be increased in accuracy. 

According to a learning apparatus defined in claim 6 and 
a learning method defined in claim 9, different types of input 
data are classified into classes depending on their charac- 
teristics and also, integrated to integrated parameters which 
are then classified according to each class determined the 
classification. This allows optimum parameters for the rec- 
ognition to be constructed. 

What is claimed is: 

1. A recognition apparatus for recognizing a given subject 
from different types of input data, comprising: 

a first classification data construction unit for constructing 
a first set of classification data from the different types 
of input data; 

a first classification unit for classifying the different types 
of input data into a first class based on the first set of 
classification data from the first classification data 
construction unit; 

preprocessing means for extracting a plurality of charac- 
teristic parameters from the different types of input data 
on the basis of said first class from said first classifi- 
cation unit; 
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a second classification data construction unit for con- 
structing a second set of classification data based on the 
plurality of characteristic parameters from the prepro- 
cessing means; 

a second classification unit for classifying the different 5 
types of input data into a second class based on the 
second set of classification data from the second clas- 
sification data construction unit; 

an integrated parameter construction unit for constructing 
an integrated parameter by integrating the plurality of 10 
characteristic parameters from the preprocessing 
means; 

a plurality of standard parameter memories for storing a 
plurality of tables; each standard parameter memory 
corresponding to a predetermined first class; each table is 
containing standard parameters corresponding to a pre- 
determined second class; and 

a matching unit for recognizing the given subject by 
matching the integrated parameter with the standard 
parameters from the table corresponding to the second 20 
class and located in the standard parameter memory 
corresponding to the first class. 

2. The recognition apparatus according to claim 1, 
wherein the given subject is a speaker of interest and the 
different types of input data include at least the voice of the 25 
speaker and an image of the speaker's mouth. 

3. The recognition apparatus according to claim 1, 
wherein the different types of input data are collected by a 
corresponding plurality of different type sensors. 

4. The recognition apparatus according to claim 1, 
wherein the preprocessing means comprises a first prepro- 
cessing means for extracting a first plurality of characteristic 
parameters for use by said second classification data con- 
struction unit, and a second preprocessing means for extract- 
ing a second plurality of characteristic parameters for use by 
said integrated parameter construction unit. 35 

5. A method of recognizing a given subject from different 
types of input data, comprising the steps of: 

constructing a first set of classification data from the 
different types of input data using a first classification 
data construction unit; 40 

classifying the different types of input data into a first 
class based on the first set of classification data using a 
first classification unit; 

extracting a plurality of characteristic parameters from the 
different types of input data on the basis of said first 45 
class from said first classification unit; 

constructing a second set of classification data based on 
the plurality of characteristic parameters from the 
extracting step using a second classification data con- 
struction unit; 50 

classifying the different types of input data into a second 
class based on the second set of classification data 
using a second classification unit; 

constructing an integrated parameter by integrating the 
plurality of characteristic parameters from the extract- 
ing step; and 

recognizing the given subject by matching the integrated 
parameter with standard parameters corresponding to 
the first class and the second class; the standard param- 
eters being stored in a plurality of standard parameter 60 
memories corresponding to predetermined first classes 
and containing a plurality of tables corresponding to 
predetermined second classes. 

6. The method according to claim 5, wherein the given 
subject is a speaker of interest and the different types of 65 
input data include at least the voice of the speaker and an 
image of the speaker's mouth. 
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7. The method according to claim 5, wherein the different 
types of input data are collected by a corresponding plurality 
of different type sensors. 

8. The method according to claim 5, wherein the extract- 
ing step comprises a first preprocessing step of extracting a 
first plurality of characteristic parameters for use by said 
second classification data construction unit, and a second 
preprocessing step of extracting a second plurality of char- 
acteristic parameters for use in said integrated parameter 
constructing step. 

9. A voice command recognition system for operating a 
voice -activated system, comprising: 

an input for receiving different types of input data from a 
plurality of corresponding input sensors; the different 
types of input data comprising voice audio data and at 
least one of ambient noise data and environmental 
conditions data; 

a first classification data construction unit for constructing 
a first set of classification data from the different types 
of input data; 

a first classification unit for classifying the different types 
of input data into a first class based on the first set of 
classification data from the first classification data 
construction unit; 

an extractor for extracting a plurality of characteristic 
parameters from the different types of input data on the 
basis of said first class from said first classification unit; 

a second classification data construction unit for con- 
structing a second set of classification data based on the 
plurality of characteristic parameters from the extrac- 
tor; 

a second classification unit for classifying the different 
types of input data into a second class based on the 
second set of classification data from the second clas- 
sification data construction unit; 

an integrator for constructing an integrated parameter by 
integrating the plurality of characteristic parameters 
from the extractor; 

a plurality of standard parameter memories for storing a 
plurality of tables; each standard parameter memory 
corresponding to a predetermined first class; each table 
containing standard parameters corresponding to a pre- 
determined second class; and 

a recognizer for recognizing a command from a given 
person by matching the integrated parameter with the 
standard parameters from the table corresponding to the 
second class and located in the standard parameter 
memory corresponding to the first class. 

10. The voice command recognition system according to 
claim 9, wherein the extractor extracts a first plurality of 
characteristic parameters for use by said second classifica- 
tion data construction unit and extracts a second plurality of 
characteristic parameters for use by said integrated param- 
eter construction unit. 

11. The voice command recognition system according to 
claim 9, wherein the extractor comprises a plurality of 
preprocessors corresponding to each of the different types of 
input data and for extracting said plurality of characteristic 
parameters. 

12. The voice command recognition system according to 
claim 9, wherein said recognizer recognizes said command 
on the basis of the smallest Euclidian distance calculated 
from each of the standard parameters in the standard param- 
eter table corresponding to said second class and the inte- 
grated parameter. 
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